InforLorV4, Main, Exploration, bibRecord, 001694

Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct

Identifieur interne : 001694 ( Main/Exploration ); précédent : 001693; suivant : 001695

Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct

Auteurs : Olivier Sigaud [France] ; Freek Stulp [France]

Source :

Revue d'intelligence artificielle [ 0992-499X ] ; 2013.

RBID : Pascal:13-0216767

Descripteurs français

Pascal (Inist)
- Adaptation, Intelligence artificielle, Boîte noire, Apprentissage renforcé, Estimation statistique, Mise à jour, Robotique, Adressage, Politique, Commande stochastique, Commande optimale, Contrôle optimal, Matrice covariance, Modélisation, Optimisation, Approche probabiliste, Méthode moyenne, Analyse statistique, Fonction coût, Entropie, Méthode matricielle, Algorithme évolutionniste, Intégrale parcours, Variance, ..
Wicri :
- topic : Intelligence artificielle, Robotique, Politique.

English descriptors

KwdEn :
- Adaptation, Addressing, Artificial intelligence, Averaging method, Black box, Cost function, Covariance matrix, Entropy, Evolutionary algorithm, Matrix method, Modeling, Optimal control, Optimal control (mathematics), Optimization, Path integral, Policy, Probabilistic approach, Reinforcement learning, Robotics, Statistical analysis, Statistical estimation, Stochastic control, Updating, Variance.

Abstract

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI² is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control wilh tools from statistical estimation theory. In this paper, we consider PI² as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI² to other members of the same family - the 'Cross-Entropy Method' and 'Covariance Matrix Adaptation - Evolutionary Strategy' - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI²-CMA for "Path Integral Policy Improvement with Co-variance Matrix Adaptation ". PI²-CMA's main advantage is that it determines the magnitude of the exploration noise automatically. We illustrate this advantage on a non-trivial simulated robotics experiment.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000061
to stream PascalFrancis, to step Curation: 000946
to stream PascalFrancis, to step Checkpoint: 000040
to stream Main, to step Merge: 001711
to stream Main, to step Curation: 001694

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct</title>
<author><name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut des Systèmes Intelligents et de Robotique Université Pierre et Marie Curie CNRS UMR 7222 4, place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Stulp, Freek" sort="Stulp, Freek" uniqKey="Stulp F" first="Freek" last="Stulp">Freek Stulp</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Cognitive Robotics École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech) 32, Boulevard Victor</s1>
<s2>75015 Paris</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><inist:fA14 i1="03"><s1>FLOWERS Research Team INRIA Bordeaux Sud-Ouest 351, Cours de la Libération</s1>
<s2>33405 Talence</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Nouvelle-Aquitaine</region>
<region type="old region" nuts="2">Aquitaine</region>
<settlement type="city">Talence</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216767</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216767 INIST</idno>
<idno type="RBID">Pascal:13-0216767</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000061</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000946</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000040</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000040</idno>
<idno type="wicri:doubleKey">0992-499X:2013:Sigaud O:adaptation:de:la</idno>
<idno type="wicri:Area/Main/Merge">001711</idno>
<idno type="wicri:Area/Main/Curation">001694</idno>
<idno type="wicri:Area/Main/Exploration">001694</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct</title>
<author><name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut des Systèmes Intelligents et de Robotique Université Pierre et Marie Curie CNRS UMR 7222 4, place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Stulp, Freek" sort="Stulp, Freek" uniqKey="Stulp F" first="Freek" last="Stulp">Freek Stulp</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Cognitive Robotics École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech) 32, Boulevard Victor</s1>
<s2>75015 Paris</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><inist:fA14 i1="03"><s1>FLOWERS Research Team INRIA Bordeaux Sud-Ouest 351, Cours de la Libération</s1>
<s2>33405 Talence</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Nouvelle-Aquitaine</region>
<region type="old region" nuts="2">Aquitaine</region>
<settlement type="city">Talence</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Adaptation</term>
<term>Addressing</term>
<term>Artificial intelligence</term>
<term>Averaging method</term>
<term>Black box</term>
<term>Cost function</term>
<term>Covariance matrix</term>
<term>Entropy</term>
<term>Evolutionary algorithm</term>
<term>Matrix method</term>
<term>Modeling</term>
<term>Optimal control</term>
<term>Optimal control (mathematics)</term>
<term>Optimization</term>
<term>Path integral</term>
<term>Policy</term>
<term>Probabilistic approach</term>
<term>Reinforcement learning</term>
<term>Robotics</term>
<term>Statistical analysis</term>
<term>Statistical estimation</term>
<term>Stochastic control</term>
<term>Updating</term>
<term>Variance</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Adaptation</term>
<term>Intelligence artificielle</term>
<term>Boîte noire</term>
<term>Apprentissage renforcé</term>
<term>Estimation statistique</term>
<term>Mise à jour</term>
<term>Robotique</term>
<term>Adressage</term>
<term>Politique</term>
<term>Commande stochastique</term>
<term>Commande optimale</term>
<term>Contrôle optimal</term>
<term>Matrice covariance</term>
<term>Modélisation</term>
<term>Optimisation</term>
<term>Approche probabiliste</term>
<term>Méthode moyenne</term>
<term>Analyse statistique</term>
<term>Fonction coût</term>
<term>Entropie</term>
<term>Méthode matricielle</term>
<term>Algorithme évolutionniste</term>
<term>Intégrale parcours</term>
<term>Variance</term>
<term>.</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Intelligence artificielle</term>
<term>Robotique</term>
<term>Politique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI<sup>2</sup>
 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control wilh tools from statistical estimation theory. In this paper, we consider PI<sup>2</sup>
 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI<sup>2</sup>
 to other members of the same family - the 'Cross-Entropy Method' and 'Covariance Matrix Adaptation - Evolutionary Strategy' - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI<sup>2</sup>
-CMA for "Path Integral Policy Improvement with Co-variance Matrix Adaptation ". PI<sup>2</sup>
-CMA's main advantage is that it determines the magnitude of the exploration noise automatically. We illustrate this advantage on a non-trivial simulated robotics experiment.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Aquitaine</li>
<li>Nouvelle-Aquitaine</li>
<li>Île-de-France</li>
</region>
<settlement><li>Paris</li>
<li>Talence</li>
</settlement>
</list>
<tree><country name="France"><region name="Île-de-France"><name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
</region>
<name sortKey="Stulp, Freek" sort="Stulp, Freek" uniqKey="Stulp F" first="Freek" last="Stulp">Freek Stulp</name>
<name sortKey="Stulp, Freek" sort="Stulp, Freek" uniqKey="Stulp F" first="Freek" last="Stulp">Freek Stulp</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001694 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001694 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0216767
   |texte=   Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022

	Serveur d'exploration sur la recherche en informatique en Lorraine
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la recherche en informatique en Lorraine

Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct

Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri